Studying Pronunciation Changes with gamms

Josef Fruehwald

18 October 2017

The basics of vowels

The articulation of vowels

Vowels are mostly articulated with the tongue body.

The articulation of vowels.

Vowels can be largely described in a two dimensional space.

The acoustics of vowels.

These tongue shapes have characteristic acoustics.

The measurement of vowels.

These tongue shapes have characteristic acoustics.

The number of vowels

Complex Vowels

Some vowels have complex articulations.

Complex Vowels

Researching Vowels

If you know where in a recording a vowel is…

Researching Vowels

…you can measure them

Researching Vowels

…average them

Researching Vowels

… compare between speakers

Researching Vowels

Brief Aside - Tidyverse

My Directory Structure

- Field Site A
    - Speaker A-1
        - Speaker A-1 audio
        - Speaker A-1 annotation
        - Speaker A-1 data
    - Speaker A-2
        - ...
    - ...
- Field Site B
- ...

Loading the data into R

library(purrr)
library(dplyr)

# Glob for file names
filenames <- Sys.glob("data/PH*/PH*/*meas.txt")

# Define the reading function
read_iy <- function(filename){
  dat <- read.delim(filename)
  out <- subset(dat, plt_vclass == "iy")
  return(out)
}

# Map
iy_df_list <- map(filenames, read_iy)

# Combine
iy_df <- bind_rows(iy_df_list)

Changes in Pronunciation

Vowel Sounds Shift

Modelling Shifting Vowel Sounds

There are a few desiderata for modelling vowel shifts like these:

Modeling Shifting Vowel Sounds

Inter-speaker differences.

Autocorrelation

GAMMS

Generalized Additive Models in the mgcv package fit all of these desiderata.

Non-linear models

The non-linear models fit with gamms utilize various flavors of “splines.” They all have slightly different mathematical properties, but there are a few things they have in common. First, they all involve some “basis”.

Non-linear models

Each of these polynomials are then weighted.

Non-linear models

Then, you sum over each of the polygons, resulting in the final smooth.

Non-linear models

There are a few different points of control you have over the splines.

The size of the basis (a.k.a. then number of “knots”)

The more knots, the more “wiggly” the fits can be.

Non-linear models

There are a few different points of control you have over the splines.

The “degree” the basis.

Fitting a GAMM

The simplest gam

library(mgcv)
ay_simple <- bam(F1 ~ s(t_prop), data = ay0_df)
summary(ay_simple)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## F1 ~ s(t_prop)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 600.8100     0.1956    3072   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df    F p-value    
## s(t_prop) 8.66  8.964 8212  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.182   Deviance explained = 18.2%
## fREML = 2.0257e+06  Scale est. = 12612     n = 329910

The simplest gam

itsadug is an excellent helper package for evaluating gams.

library(itsadug)
plot_smooth(ay_simple, view = "t_prop", rug = F, print.summary = F)

More complex

ay_sex <- bam(F1 ~ sex + s(t_prop, by = sex), data = ay0_df)
summary(ay_sex)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## F1 ~ sex + s(t_prop, by = sex)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 549.6168     0.2611  2105.1   <2e-16 ***
## sexf         94.5323     0.3548   266.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                  edf Ref.df    F p-value    
## s(t_prop):sexm 8.068  8.755 3302  <2e-16 ***
## s(t_prop):sexf 8.650  8.962 7135  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.332   Deviance explained = 33.2%
## fREML = 1.9924e+06  Scale est. = 10305     n = 329910

More Complex

plot_smooth(ay_sex, view = "t_prop", plot_all = "sex", rug = F, print.summary = F)

Smoothing across 2 dimensions

It is possible to use the same s() function to define a smooth over two dimensions:

s(t_prop, dob, by = sex)

But the penalizing properties of s() won’t play nicely with the fact that t_prop and dob are on radically different scales. For this, we need to use te to define a tensor product smooth.

te(t_prop, dob, by = sex)

Smoothing across 2 dimensions

ay_2dim <- bam(F1 ~ sex + te(t_prop, dob, by = sex), data =ay0_df)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## F1 ~ sex + te(t_prop, dob, by = sex)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 550.6695     0.2547  2162.2   <2e-16 ***
## sexf         91.3726     0.3459   264.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                       edf Ref.df    F p-value    
## te(t_prop,dob):sexm 22.90  23.86 1477  <2e-16 ***
## te(t_prop,dob):sexf 23.79  23.99 3548  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.373   Deviance explained = 37.3%
## fREML = 1.9821e+06  Scale est. = 9675.8    n = 329910

Smoothing across 2 dimensions

plot_smooth(ay_2dim, 
              view = "t_prop", 
              cond = list(dob = 1900, sex = 'f'),
              rug = F,
              col = ptol_pal()(2)[1],
              print.summary = F,
              ylim = c(450, 900))
plot_smooth(ay_2dim, 
              view = "t_prop", 
              cond = list(dob = 1990, sex = 'f'),
              rug = F,
              col = ptol_pal()(2)[2],
              print.summary = F,
              ylim = c(450, 900),
              add = T)

Plotting

ay_2dim_pred <- get_predictions(ay_2dim, 
                                cond = list(t_prop = seq(0,1,length = 50),
                                            dob = c(1900, 1950, 1990),
                                            sex = c("f", "m")),
                                print.summary = F)
ay_2dim_pred$fit <- ay_2dim_pred$fit[,1]
head(ay_2dim_pred)
##   sex     t_prop  dob      fit       CI
## 1   f 0.00000000 1900 609.2748 6.129398
## 2   m 0.00000000 1900 562.4466 7.875854
## 3   f 0.02040816 1900 627.2749 5.407234
## 4   m 0.02040816 1900 571.4634 6.971127
## 5   f 0.04081633 1900 645.1702 4.761972
## 6   m 0.04081633 1900 580.4170 6.162962

Plotting

ay_2dim_pred %>%
  mutate(dob = factor(dob))%>%
  ggplot(aes(t_prop, fit))+
    geom_line(aes(color = dob))+
    geom_ribbon(aes(ymin = fit - (2 * CI),
                    ymax = fit + (2 * CI),
                    fill = dob), 
                alpha = 0.6)+
    scale_color_ptol()+
    scale_fill_ptol()+  
    facet_wrap(~sex)+
    theme_bw()

Random Effects

We can include both random intercepts and random smooths.

Random Intercepts

Random intercepts take into account that after taking into account a speaker’s date of birth and sex, they may still have a wholesale upwards or downwards shift in their formant trajectories. But, we sort of want to bias estimates of individual level estimates to be closer to the group smooth.

s(speaker, bs = 're')

Random smooths

Random smooths take into account that individual speakers may have idiosyncractic trajectories.

s(t_prop, speaker, bs = 'fs', m = 1)

Random Effects

ay_gamm <- bam(F1 ~ sex + 
                    te(t_prop, dob, by=sex)+
                    s(idstring, bs = 're')+
                    s(t_prop, idstring, bs = 'fs', m = 1),
               data = ay0_df)
ay_gamm_pred <- get_predictions(ay_gamm,
                                cond = list(t_prop = seq(0,1,length = 50),
                                            dob = c(1900, 1950, 1990),
                                            sex = c("f", "m")),
                                rm.ranef = T,
                                print.summary = F)
ay_gamm_pred$fit <- ay_gamm_pred$fit[,1]

Plotting

ay_gamm_pred %>%
  mutate(dob = factor(dob))%>%
  ggplot(aes(t_prop, fit))+
    geom_line(aes(color = dob))+
    geom_ribbon(aes(ymin = fit - (2 * CI),
                    ymax = fit + (2 * CI),
                    fill = dob),
                alpha = 0.6)+
    scale_color_ptol()+
    scale_fill_ptol()+
    facet_wrap(~sex)+
    theme_bw()

Autocorrelation

Including random smooths solves one violation of the assumption that the data consists of independent identically distributed observations.

The next big problem is that the estimate of F1 at \(t=2\) is not independent from the estimate of F1 at \(t=1\). In fact, the range of possible values of F1 at \(t=2\) is very limited given its value at \(t=1\). They are highly auto-correlated.

Autocorrelation

resid_gam(ay_gamm) %>%
  acf(lag.max = 15)

Autocorrelation

It is possible to set up an AR1 model in gamms

AR1

AR is for “autoregression”. This means you’re trying to predict the value of each data points using the values of the preceding data points. 1 here means were’ just trying to predict each data point with just the immediately preceding data point.

Autocorrelation

Here’s a really stupid AR1 model

ay0_df %>%
  filter(n %in% c(1,2)) %>%
  dplyr::select(idstring, id, F1, n, sex, dob)%>%
  spread(n, F1)%>%
  dplyr::select(sex, dob, `1`, `2`)->wide_data
head(wide_data)
## # A tibble: 6 x 4
##      sex   dob      `1`      `2`
##   <fctr> <dbl>    <dbl>    <dbl>
## 1      m  1979 485.1198 483.2470
## 2      m  1979 530.3898 529.2495
## 3      m  1979 591.2785 563.8943
## 4      m  1979 562.6196 581.2456
## 5      m  1979 541.6316 556.5180
## 6      m  1979 550.9698 566.1853
lm(`2` ~ `1`*sex, data = wide_data) %>%
  summary()
## 
## Call:
## lm(formula = `2` ~ `1` * sex, data = wide_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -616.94  -21.72   -3.10   18.82  949.28 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 146.226103   2.847221  51.357  < 2e-16 ***
## `1`           0.763266   0.004979 153.289  < 2e-16 ***
## sexf         47.730177   3.599075  13.262  < 2e-16 ***
## `1`:sexf     -0.024506   0.006028  -4.066  4.8e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 50.38 on 21990 degrees of freedom
## Multiple R-squared:  0.7979, Adjusted R-squared:  0.7979 
## F-statistic: 2.894e+04 on 3 and 21990 DF,  p-value: < 2.2e-16

Autocorrelation

To include an autoregressive component into the model, we need to

  1. Indicate where each individual formant track starts.
  2. Tell the model what we expect the autocorrelation to be.

Autocorrelation

start will be TRUE where each individual formant trajectory starts, FALSE elsewhere.

ay0_df <- ay0_df %>%
            group_by(idstring, id) %>%
            mutate(meas_point = 1:n(),
                   start = meas_point == 1)

We’ll use the autocorrelation coefficient at lag = 1 as the estimated autocorrelation.

valRho <- acf(resid_gam(ay_gamm), plot = F)$acf[2]
valRho
## [1] 0.8342334

At this point in the talk, I’m going to revert to models that I’ve “baked earlier” because these take a very long time to fit!

Autocorrelation

The illustration model

ay_AR<- bam(F1 ~ sex + 
                    te(t_prop, dob, by=sex)+
                    s(idstring, bs = 're')+
                    s(t_prop, idstring, bs = 'fs', m = 1),
              AR.start = ay0_df$start,
              rho = valRho,
              data = ay0_df)

The actual model we’re looking at:

ay0_AR_F1 <- bam(F1 ~ sex +
                        te(t_prop, dob, by = sex)+
                        te(t_prop, log2dur, by = sex)+
                        s(word, bs = 're')  +                   
                        s(idstring, bs = 're')+
                        s(t_prop, idstring, bs = 'fs', m = 1), 
                      AR.start = ay0_df$start, 
                      rho = valRhoF1,
                      data = ay0_df)

Autocorrelation

The AR1 model has dramatically reduced the autocorrelation of the residuals.

acf(resid_gam(ay0_AR_F1), lag.max = 15)

Autocorrelation

The End Result

End Result

Sampling from the Posterior

Landmarks

I now have a model for the estimated formant values across generational time:

But there are other things I would like to know.

Sampling from the posterior

We can get estimates and credible intervals for these values by “sampling from the posterior”. We can illustrate with a very simple linear model.

car_model <- lm(dist ~ speed, data = cars)
summary(car_model)
## 
## Call:
## lm(formula = dist ~ speed, data = cars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.069  -9.525  -2.272   9.215  43.201 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.5791     6.7584  -2.601   0.0123 *  
## speed         3.9324     0.4155   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Sampling from the posteriror

In this model, there is uncertainty about the intercept, about the slope, and a joint uncertainty about both.

(mu <- coef(car_model))
## (Intercept)       speed 
##  -17.579095    3.932409
(Sigma <- vcov(car_model))
##             (Intercept)      speed
## (Intercept)   45.676514 -2.6588234
## speed         -2.658823  0.1726509

Sampling from the posterior

The trick is, we can treat these coefficients and variance-covariance matrix as a multidimensional normal distribution, and sample new hypothetical intercepts and slopes from it.

Sampling from the posterior

library(MASS)
new_params <- mvrnorm(n = 15, mu = mu, Sigma = Sigma)
head(new_params)
##      (Intercept)    speed
## [1,]   -19.77808 4.332695
## [2,]   -29.42593 4.668045
## [3,]   -17.71067 3.884503
## [4,]   -17.21278 3.792104
## [5,]   -24.40994 4.288554
## [6,]   -12.18943 3.814938

Sampling from the posterior

If you take a large enough sample (way more than 15), you can estimate credible intervals from properties of these fitted values.

Sampling from the posterior

ay_mu <- coef(ay0_AR_F1)
ay_Sigma <- vcov(ay0_AR_F1)

These are huge, because of all of the random effects parameters.

str(ay_Sigma)
##  num [1:4002, 1:4002] 19.77 -13.463 0.371 3.045 6.042 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:4002] "(Intercept)" "sexm" "te(t_prop,dob):sexf.1" "te(t_prop,dob):sexf.2" ...
##   ..$ : chr [1:4002] "(Intercept)" "sexm" "te(t_prop,dob):sexf.1" "te(t_prop,dob):sexf.2" ...

I’m going to just select them away:

ay_mu <- ay_mu[!grepl("(idstring)|(word)", names(ay_mu))]
ay_Sigma <- ay_Sigma[!grepl("(idstring)|(word)", rownames(ay_Sigma)),
                     !grepl("(idstring)|(word)", colnames(ay_Sigma))]
str(ay_Sigma)
##  num [1:90, 1:90] 19.77 -13.463 0.371 3.045 6.042 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:90] "(Intercept)" "sexm" "te(t_prop,dob):sexf.1" "te(t_prop,dob):sexf.2" ...
##   ..$ : chr [1:90] "(Intercept)" "sexm" "te(t_prop,dob):sexf.1" "te(t_prop,dob):sexf.2" ...

Sampling from the posterior

Sampling “new” paramters done like so:

ay_sims <- mvrnorm(n = 1000, mu = ay_mu, Sigma = ay_Sigma)
str(ay_sims)
##  num [1:1000, 1:90] 661 663 658 665 661 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:90] "(Intercept)" "sexm" "te(t_prop,dob):sexf.1" "te(t_prop,dob):sexf.2" ...

Now, we need to multiply them by the “model matrix.” This part might be a bit opaque for people unfamiliar with the linear algebra definition of regression.

Sampling from the posterior

# Create a prediction data frame
ay_pred <- expand.grid(dob = 1900:1990,
                       t_prop = seq(0,1,length = 500),
                       log2dur = -3.32,
                       word = "LIKE",
                       idstring = "speaker1")

# Get the model matrix
ay_pred_matrix <- predict(ay0_AR_F1, newdata = ay_pred, type = "lpmatrix")

# Eliminate random effects
ay_pred_matrix <- ay_pred_matrix[,!grepl("(idstring)|(word)", colnames(ay_pred_matrix))]

Sampling from the posterior

Now, you mutiply the model matrix by the transposed simulated paramters

ay_sim_fit <- ay_pred_matrix %*% t(ay_sims)
ay_sim_fit[1:6, 1:10]
##       [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
## 1 619.6415 627.6058 613.6418 610.7431 615.3648 601.5065 618.7092 619.8936
## 2 621.7914 628.8649 615.8755 612.7995 616.8847 603.8937 620.8953 621.3623
## 3 623.9034 630.1044 618.0664 614.8216 618.3799 606.2471 623.0410 622.8035
## 4 625.9743 631.3228 620.2108 616.8067 619.8485 608.5639 625.1432 624.2148
## 5 628.0010 632.5184 622.3051 618.7518 621.2883 610.8413 627.1983 625.5940
## 6 629.9803 633.6896 624.3459 620.6541 622.6972 613.0766 629.2031 626.9388
##       [,9]    [,10]
## 1 608.4402 604.4825
## 2 610.7276 607.3058
## 3 612.9755 610.0771
## 4 615.1808 612.7921
## 5 617.3400 615.4463
## 6 619.4501 618.0355

Each row corresponds to a row in the prediction data frame, and each colum is a simulated fit from the posterior.

Doing interesting things

# Coerce into a data frame
ay_sim_fit_df <- data.frame(ay_sim_fit)

# 1. Paste the columns on to the prediction frame
# 2. Go from wide to long
ay_pred_long <- ay_pred%>%
                  bind_cols(ay_sim_fit_df) %>%
                  gather(replicate, value, X1:X1000)%>%
                  dplyr::select(-word, -idstring)

head(ay_pred_long)
##    dob t_prop sex log2dur replicate    value
## 1 1900      0   f   -3.32        X1 619.6415
## 2 1901      0   f   -3.32        X1 621.7914
## 3 1902      0   f   -3.32        X1 623.9034
## 4 1903      0   f   -3.32        X1 625.9743
## 5 1904      0   f   -3.32        X1 628.0010
## 6 1905      0   f   -3.32        X1 629.9803

Doing Interesting things

ay_max_f1 <- ay_pred_long %>%
              group_by(dob, sex, replicate) %>%
              filter(value == max(value))
head(ay_max_f1)
## Source: local data frame [6 x 6]
## Groups: dob, sex, replicate [6]
## 
## # A tibble: 6 x 6
##     dob    t_prop    sex log2dur replicate    value
##   <int>     <dbl> <fctr>   <dbl>     <chr>    <dbl>
## 1  1990 0.3066132      f   -3.32        X1 685.8463
## 2  1988 0.3086172      f   -3.32        X1 687.6547
## 3  1989 0.3086172      f   -3.32        X1 686.7452
## 4  1987 0.3106212      f   -3.32        X1 688.5743
## 5  1986 0.3126253      f   -3.32        X1 689.5045
## 6  1985 0.3146293      f   -3.32        X1 690.4456

Doing Interesting things

ggplot(ay_max_f1, aes(dob, t_prop))+
  geom_point(alpha = 0.01)+
  facet_wrap(~sex)+
  theme_bw()

Geting actual Credible Intervals

library(coda)
ay_max_f1 %>%
  group_by(dob, sex)%>%
  nest()%>%
  mutate(t_hpd = map(data, ~.x$t_prop %>% as.mcmc %>% HPDinterval),
         t_hpd_df = map(t_hpd, data.frame))%>%
  unnest(t_hpd_df)%>%
  ggplot(aes(dob))+
    geom_ribbon(aes(ymin = lower, ymax = upper, fill = sex), alpha = 0.8)+
    scale_fill_ptol()+
    theme_bw()

Geting actual Credible Intervals

  many_hpds <- function(x){
    probs = seq(0.1,0.95, length = 20)
    x_mcmc <- as.mcmc(x)
    hpd_list <- map(probs, ~HPDinterval(x_mcmc, prob = .x))
    hpd_list <- map(hpd_list, data.frame)
    hpd_df <- bind_rows(hpd_list) %>% mutate(prob = probs)
    return(hpd_df)
  }
ay_max_f1 %>%
  group_by(dob, sex)%>%
  nest()%>%
  mutate(t_hpds = map(data, ~.x$t_prop %>%many_hpds)) %>%
  unnest(t_hpds)%>%
  mutate(group = paste0(sex, prob))%>%
  ggplot(aes(dob))+
    geom_ribbon(aes(ymin = lower, ymax = upper, group = group, fill = sex), alpha = 0.15)+
    scale_fill_ptol()+
    ylab("propoprtional time")+
    theme_bw()

Thanks!